Back

npj Digital Medicine

85 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
A Virtual Patients Ensemble Approach for Predicting Surgical Complications
#1 (42.6%)
Show abstract

AI has shown promise in predicting surgical complications, but most existing models estimate overall risk levels rather than identifying the specific complications an individual patient may develop. We present an AI agent that uses a Virtual Patients Ensemble (VPE) approach to generate individualized predictions of surgical complications from unstructured case descriptions. The agent applies structured reasoning to extract diagnoses, surgical procedures, and risk factors from clinical narratives...

2
Prompt injection attacks on vision-language models for surgical decision support
#1 (42.4%)
Show abstract

ImportanceArtificial Intelligence-driven analysis of laparoscopic video holds potential to increase the safety and precision of minimally invasive surgery. Vision-language models are particularly promising for video-based surgical decision support due to their capabilities to comprehend complex temporospatial (video) data. However, the same multimodal interfaces that enable such capabilities also introduce new vulnerabilities to manipulations through embedded deceptive text or images (prompt inj...

3
Surgical Information Assistant: A technical report on an agentic information retrieval System for surgical information
#1 (40.9%)
Show abstract

We present the Surgical Information Assistant, an agentic retrieval-augmented generation (RAG) system designed to improve access to surgical knowledge in resource-constrained settings. Built on the Open Manual of Surgery for Resource-Limited Settings, the assistant uses a retrieval-method we call DeRetSyn (Decom-pose-Retrieve-Synthesize). We evaluate DeRetSyn using automated metrics and partial human validation across 14,500 synthesized question-answer pairs and find that it achieves 63% top-1 a...

4
Enhancing Fairness in Diabetes Prediction Systems through Smart User Interface Design
2025-06-05 medical ethics 10.1101/2025.06.04.25328959
#1 (30.8%)
Show abstract

ObjectivesArtificial intelligence (AI) in chronic disease prediction often exhibits algorithmic biases, hindering equitable healthcare delivery. This study aims to develop and evaluate a Smart User Interface (Smart UI) framework that enhances fairness in diabetes prediction systems by operationalizing fairness at the human-computer interaction level, a dimension frequently overlooked in AI fairness research. Materials and MethodsWe employed a nine-metric fairness evaluation framework across fou...

5
Medical Hallucination in Foundation Models and Their Impact on Healthcare
2025-03-03 health systems and quality improvement 10.1101/2025.02.28.25323115
#1 (30.7%)
Show abstract

Hallucinations in foundation models arise from autoregressive training objectives that prioritize token-likelihood optimization over epistemic accuracy, fostering overconfidence and poorly calibrated uncertainty. In clinical set- tings, where profound knowledge asymmetry exists between AI systems and end-users, undetected misinformation such as fabricated medications, contraindicated drug recommendations, or false imaging interpretations poses direct patient safety risks. We define medical hallu...

6
AI-generated data contamination erodes pathological variability and diagnostic reliability
2026-01-22 health informatics 10.64898/2026.01.19.26344383
#1 (30.6%)
Show abstract

Generative artificial intelligence (AI) is rapidly populating medical records with synthetic or partially AI-generated content, creating a feedback loop where future models are increasingly at risk of training on uncurated AI-generated data. However, the clinical consequences of this AI-generated data contamination remain unexplored. Here, we show that in the absence of mandatory human verification, this self-referential cycle drives a rapid erosion of pathological variability and diagnostic rel...

7
Decentralized, privacy-preserving surgical video analysis with Swarm Learning
#1 (26.5%)
Show abstract

BackgroundProgress in artificial intelligence-based analysis of surgical videos has been constrained by reliance on manual frame-level annotations rather than patient-level outcomes. In addition, concerns about data privacy restrict the exchange of laparoscopic video data and, thereby, multicenter collaboration. MethodsTo address these limitations, we developed a pipeline that integrates weakly supervised deep learning with Swarm Learning, a decentralized machine learning approach that enables ...

8
Automatic Physical Examination Segmentation within Objective Structured Clinical Examination Videos
2025-04-05 medical education 10.1101/2025.04.03.25325195
#1 (26.4%)
Show abstract

ObjectiveAssessing medical student performance in Objective Structured Clinical Examinations (OSCEs) is labor-intensive, requiring trained evaluators to review 15-minute long videos. The physical examination period constitutes only a small portion of these videos. Automated segmentation of OSCE videos could significantly streamline the evaluation process by detecting this physical exam portion for targeted evaluation. Current video analysis approaches struggle with these long recordings due to ...

9
Hippocrates-o1: A Guideline-Aware, Orchestrated, Self-Refining Protocol for Specialty-Specific Clinical Reasoning
#1 (26.2%)
Show abstract

BackgroundClinical decision support requires language models that provide guideline-aligned, context-aware reasoning with clear justification. Many existing benchmarks emphasize multiple-choice or short-form question answering and mainly capture factual recall rather than longitudinal clinical reasoning from extended clinical notes. Hippocrates-o1 is a family of domain-tailored clinical reasoning pipelines that combine structured prompts, guideline-informed retrieval, and iterative self-refineme...

10
Evaluating the AI Potential as a Safety Net for Diagnosis: A Novel Benchmark of Large Language Models in Correcting Diagnostic Errors
2026-02-24 health systems and quality improvement 10.64898/2026.02.22.26346832
#1 (25.6%)
Show abstract

BackgroundDiagnostic errors are a leading cause of preventable patient harm, often occurring during early clinical encounters where diagnostic uncertainty is maximal. Large language models (LLMs) have shown potential in medical reasoning, yet their ability to function as a diagnostic safety net, specifically by identifying and correcting human diagnostic errors, remains systematically unquantified. We evaluated whether state-of-the-art LLMs can effectively challenge, rather than merely confirm, ...

11
The Neurosurgical Uncertainty Index: Self-Doubting AI for rare or unexpected surgical complications
#1 (25.4%)
Show abstract

Rare or unexpected postoperative neurosurgical complications pose a challenge due to clinical variability and gaps in available data. We introduce the Neurosurgical Uncertainty Index (NUI), an uncertainty-aware AI framework that integrates bootstrap sampling for aleatoric uncertainty, isolation forest anomaly detection, and clinical calibration to predict and stratify risks for 13 complications. NUI distinguishes between data-driven and model-driven uncertainty and highlights cases that conventi...

12
Dissection of medical AI reasoning processes via physician and generative-AI collaboration
2023-05-16 dermatology 10.1101/2023.05.12.23289878
#1 (25.1%)
Show abstract

Despite the proliferation and clinical deployment of artificial intelligence (AI)-based medical software devices, most remain black boxes that are uninterpretable to key stakeholders including patients, physicians, and even the developers of the devices. Here, we present a general model auditing framework that combines insights from medical experts with a highly expressive form of explainable AI that leverages generative models, to understand the reasoning processes of AI devices. We then apply ...

13
Evaluating the Influence of Demographic Identity in the Medical Use of Large Language Models
2025-07-11 medical ethics 10.1101/2025.07.09.25331072
#1 (24.8%)
Show abstract

As large language models (LLMs) are increasingly adopted in medical decision-making, concerns about demographic biases in AIgenerated recommendations remain unaddressed. In this study, we systematically investigate how demographic attributes--specifically race and gender--affect the diagnostic, medication, and treatment decisions of LLMs. Using the MedQA dataset, we construct a controlled evaluation framework comprising 20,000 test cases with systematically varied doctor-patient demographic pair...

14
PaiX Net: A Next-Generation Second-Opinion Platform for Pathology
2026-02-09 pathology 10.64898/2026.02.04.26345344
#1 (24.7%)
Show abstract

Pathology faces persistent challenges including a global shortage of specialists, uneven access to expertise, increasing diagnostic complexity, and a growing need for second-opinion consultations. While digital and telepathology platforms address parts of this problem, existing solutions often trade accessibility for structured, workflow-aware clinical integration. At the same time, multimodal medical AI shows promise for diagnostic support but raises concerns regarding transparency, automation ...

15
CardiacGPT: A Real-Time AI Assistant for Intraoperative Guidance and Postoperative Decision Support in Cardiac Surgery
#1 (24.6%)
Show abstract

BackgroundCardiac surgery is one of the most complex and high-stakes areas of medicine, where intraoperative decisions must be made within seconds and incomplete information can compromise outcomes. Traditional risk scores and rule-based decision support tools provide limited real-time guidance and rarely integrate the unstructured data streams available during surgery. Recent advances in large language models (LLMs) such as OpenAIs GPT-5 and Anthropics Claude 3.5 family have demonstrated state-...

16
Probing the Surgical Competence of LLMs: A global health study leveraging AfriMedQA benchmarks
#1 (24.5%)
Show abstract

Global surgical care faces a severe workforce shortage, with more than 1.2 million additional specialists needed by 2030, particularly in low- and middle-income countries (LMICs). Large language models (LLMs) have demonstrated impressive medical reasoning on standardized exams, but their safety, reliability, and specialty-specific performance--especially in procedural fields such as surgery--remain uncertain. Here we evaluate over 40 state-of-the-art LLMs on 3,900 expert-authored multiple-choice...

17
Toward Digital Twins in the Intensive Care Unit: A Medication Management Case Study
2024-12-28 intensive care and critical care medicine 10.1101/2024.12.20.24319170
#1 (24.5%)
Show abstract

ObjectiveTo evaluate the efficacy of digital twins developed using a large language model (LLaMA-3), fine-tuned with Low-Rank Adapters (LoRA) on ICU physician notes, and to determine whether specialty-specific training enhances treatment recommendation accuracy compared to other ICU specialties or zero-shot baselines. Materials and MethodsDigital twins were created using LLaMA-3 fine-tuned on discharge summaries from the MIMIC-III dataset, where medications were masked to construct training and...

18
A Mobile AI-enhanced Platform for Standardized Wound Assessment and Clinical Decision Support
2026-01-23 dermatology 10.64898/2026.01.22.26344407
#1 (24.4%)
Show abstract

Chronic wounds affect over 1.2 million Canadians and incur healthcare costs exceeding $13 billion annually, with global expenditures approaching $149 billion. Current clinical practice relies on manual measurements and subjective visual evaluations, which overestimate wound area by up to 40% and demonstrate poor-to-moderate inter-rater reliability. This variability complicates longitudinal monitoring and evidence-based treatment selection. We developed and evaluated an integrated mobile platform...

19
Computer Vision-Based Retrieval of Steps and Errors in Laparoscopic Cholecystectomy
#1 (24.2%)
Show abstract

Traditional surgical training relies heavily on hands-on experiences gained through relatively infrequent procedures during apprenticeships. Recently, postoperative review has become a valuable supplement to this model, offering learning opportunities outside the operating room. However, its adoption remains limited due to its inefficiencies. In this study, we developed a Computer Vision-based system designed to efficiently navigate and retrieve critical segments from laparoscopic cholecystectom...

20
MedOS: AI-XR-Cobot World Model for Clinical Perception and Action
2026-02-23 health informatics 10.64898/2026.02.18.26345936
#1 (24.1%)
Show abstract

Medicine historically separates abstract clinical reasoning from physical intervention. We bridge this divide with MedOS, a general-purpose embodied world model. Mimicking human cognition via a dual-system architecture, MedOS demonstrates superior reasoning on biomedical benchmarks and autonomously executes complex clinical research. To extend this intelligence physically, the system simulates medical procedures as a physics-aware model to foresee adverse events. Generating and validating on the...